Fuzzification of Agglomerative Hierarchical Crisp Clustering Algorithms

نویسندگان

  • Mathias Bank
  • Friedhelm Schwenker
چکیده

User generated content from fora, weblogs and other social networks is a very fast growing data source in which different information extraction algorithms can provide a convenient data access. Hierarchical clustering algorithms are used to provide topics covered in this data on different levels of abstraction. During the last years, there has been some research using hierarchical fuzzy algorithms to handle comments not dealing with one topic but many different topics at once. The used variants of the well-known fuzzy c-means algorithm are nondeterministic and thus the cluster results are irreproducible. In this work, we present a deterministic algorithm that fuzzifies currently available agglomerative hierarchical crisp clustering algorithms and therefore allows arbitrary multi-assignments. It is shown how to reuse well-studied linkage metrics while the monotonic behavior is analyzed for each of them. The proposed algorithm is evaluated using collections of the RCV1 and RCV2 corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interpreting and Extending Classical Agglomerative Clustering Algorithms using a Model-Based approach

We present two results which arise from a model-based approach to hierarchical agglomerative clustering. First, we show formally that the common heuristic agglomerative clustering algorithms – Ward’s method, single-link, complete-link, and a variant of group-average – are each equivalent to a hierarchical model-based method. This interpretation gives a theoretical explanation of the empirical b...

متن کامل

2 Review of Agglomerative Hierarchical Clustering Algorithms

Hierarchical methods are well known clustering technique that can be potentially very useful for various data mining tasks. A hierarchical clustering scheme produces a sequence of clusterings in which each clustering is nested into the next clustering in the sequence. Since hierarchical clustering is a greedy search algorithm based on a local search, the merging decision made early in the agglo...

متن کامل

Algorithms for Model-Based Gaussian Hierarchical Clustering

Agglomerative hierarchical clustering methods based on Gaussian probability models have recently shown promise in a variety of applications. In this approach, a maximum-likelihood pair of clusters is chosen for merging at each stage. Unlike classical methods, model-based methods reduce to a recurrence relation only in the simplest case, which corresponds to the classical sum of squares method. ...

متن کامل

Implementation of Hybrid Clustering Algorithm with Enhanced K-Means and Hierarchal Clustering

We are propose a hybrid clustering method, the methodology combines the strengths of both partitioning and agglomerative clustering methods. Clustering algorithms that build meaningful hierarchies out of large document collections are ideal tools for their interactive visualization and exploration as they provide data-views that are consistent, predictable, and at different levels of granularit...

متن کامل

Space Distortion and Monotone Admissibility in Agglomerative Clustering

This paper discusses the admissibility of agglomerative hierarchical clustering algorithms with respect to space distortion and monotonicity, as defined by Yadohisa et al. and Batagelj, respectively. Several admissibilities and their properties are given for selecting a clustering algorithm. Necessary and sufficient conditions for an updating formula, as introduced by Lance and Williams, are pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010